Sketch \star-metric: Comparing Data Streams via Sketching
نویسندگان
چکیده
In this paper, we consider the problem of estimating the distance between any two large data streams in smallspace constraint. This problem is of utmost importance in data intensive monitoring applications where input streams are generated rapidly. These streams need to be processed on the fly and accurately to quickly determine any deviance from nominal behavior. We present a new metric, the Sketch ⋆-metric, which allows to define a distance between updatable summaries (or sketches) of large data streams. An important feature of the Sketch ⋆-metric is that, given a measure on the entire initial data streams, the Sketch ⋆-metric preserves the axioms of the latter measure on the sketch (such as the non-negativity, the identity, the symmetry, the triangle inequality but also specific properties of the f -divergence). Extensive experiments conducted on both synthetic traces and real data allow us to validate the robustness and accuracy of the Sketch ⋆-metric.
منابع مشابه
Sketch ?-metric: Comparing Data Streams via Sketching RESEARCH REPORT
In this paper, we consider the problem of estimating the distance between any two large data streams in smallspace constraint. This problem is of utmost importance in data intensive monitoring applications where input streams are generated rapidly. These streams need to be processed on the fly and accurately to quickly determine any deviance from nominal behavior. We present a new metric, the S...
متن کاملComparing Data Streams via Sketching
We consider the problem of estimating the distance between any two large data streams in smallspace constraint. This problem is of utmost importance in data intensive monitoring applications where input streams are generated rapidly. These streams need to be processed on the fly and accurately to quickly determine any deviance from nominal behavior. We present a new metric, the Sketch ⋆-metric,...
متن کاملCorrections to “LD-Sketch: A Distributed Sketching Design for Accurate and Scalable Anomaly Detection in Network Data Streams”
In this article, we describe the corrections to our paper “LD-Sketch: A Distributed Sketching Design for Accurate and Scalable Anomaly Detection in Network Data Streams” published at IEEE INFOCOM 2014. We also clarify the complexity issue raised by some readers. 1 Corrections to Lemmas and Theorems
متن کاملImproved Sketching of Hamming Distance with Error Correcting
We address the problem of sketching the hamming distance of data streams. We present a new notion of sketching technique, Fixable sketches and we show that using such sketch not only we reduce the sketch size, but also restore the differences between the streams. Our contribution: For two streams with hamming distance bounded by k we show a sketch of size O(k logn) with O(logn) processing time ...
متن کاملSketch-Based Multi-query Processing over Data Streams
Recent years have witnessed an increasing interest in designing algorithms for querying and analyzing streaming data (i.e., data that is seen only once in a fixed order) with only limited memory. Providing (perhaps approximate) answers to queries over such continuous data streams is a crucial requirement for many application environments; examples include large telecom and IP network installati...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1207.6465 شماره
صفحات -
تاریخ انتشار 2012